Kriging is an intriguing model for spatial interpolation. We can estimate the number of crimes with a limited set of data points we have by using kriging. Kriging essentially models the spatial relationship between points and penalises points that are farther away from each other, giving less weight, as shown in the equation below.1
\hat Z(S_0) = \Sigma_{i = 1}^{N}\lambda_iZ(S_i)
where
Z(S_i) = the measured value at the ith location
\lambda_i = weight for the measured value at the ith location
s_0 = the prediction location
N = the number of measured values
Unlike simpler methods, such as Inverse Distance Weighted Interpolation or Linear Regression, kriging interpolates based on the spatial distribution of empirical observations, that being the data points we have, instead of assuming a theoretical distribution. The interpolation ultimately results in a map of prediction surface.
Krigining analysis is broadly in three parts.
Pre-process data
Create variogram
Make a prediction
Variogram
Variogram, also known as semi-variogram, is a diagram of semi-variance, which is a half of mean squared difference in the values of paired locations. At distance h between location i and location j, Semivariogram(distance_h) = 0.5 * average(value_i - value_j)^2). For each pair of points, semi-variance is plotted against distance between points. Hence, variogram shows the covariance between each pair of points.
After building variogram, distribution that best fits the variogram is selected, as shown below. While it is possible to specify a distribution of interest. it is also possible for a software to choose the best-fit model. Kriging then uses the fitted variogram values to make predictions at unsampled locations.
Pre-Process Data
Set Up
ASB (anti-social behaviour) has shown a positive auto-correlation. Therefore, we will continue working on ASB.
Code
#--Install / load packagespacman::p_load(sp, sf, data.table, rio, here, leaflet, gstat, tidyverse, Metrics, scales, corrr, ggcorrplot, FactoMineR, factoextra, corrplot)#--Import street-level asb dataasb <-import(here("3_output", "asb_with_nearest_distances.csv"))#--Calculate count of crimes per location coordinateasb_count <- asb |>group_by(location.latitude, location.longitude) |>count() |>ungroup() |>inner_join(asb, by =c('location.latitude', 'location.longitude')) |>distinct(location.latitude, location.longitude, .keep_all =TRUE) |>group_by(location.latitude, location.longitude) |>mutate(location_id =cur_group_id()) |>ungroup()#--Rename columns names(asb_count)[grepl('longitude', names(asb_count))] <-'x'names(asb_count)[grepl('latitude', names(asb_count))] <-'y'#--Convert dataframe to sf object and reproject to OSGB36asb_count_sf <- asb_count |>st_as_sf(coords =c('x', 'y'), crs =4326) #--Get coordinatesasb_count_sf <- asb_count_sf |>mutate(x =st_coordinates(asb_count_sf)[, 1],y =st_coordinates(asb_count_sf)[, 2])#--Change the sf back to dfasb_count <-st_drop_geometry(asb_count_sf)
The very first step of kriging analysis is to pre-proecess the data into a count data by location pairs. At every unique coordinate of longitude and latitude (or x and y), the number of anti-social behaviour (ASB) was counted.
Split Data into Test & Train Sets
Code
#--Create random indicestotal_rows <-nrow(asb_count)sample_size <-round(total_rows *0.75)set.seed(1234) # for reproducibilityrandom_indices <-sample(1:total_rows, sample_size, replace =FALSE)#--Create the test set using the random indicestrain_asb <- asb_count |>filter(location_id %in% random_indices)# Create the training set by excluding the indices used for the test settest_asb <- asb_count |>filter(!location_id %in% random_indices)# Convert train and test sets to sp objectscoordinates(train_asb) <-c("x", "y")proj4string(train_asb) <-CRS("+proj=longlat +datum=WGS84")coordinates(test_asb) <-c("x", "y")proj4string(test_asb) <-CRS("+proj=longlat +datum=WGS84")
The count of ASB data at each unique location was randomly split into test and train sets at a ratio of 75% and 25% in order to evaluate the kriging model.
Create Variogram
Code
#--Create variogramvgm <-variogram(log(n) ~ x + y, train_asb, width=0.1)#--Create the fitted curve linefit <-fit.variogram(vgm, vgm(c("Gau", "Sph", "Mat", "Exp")), fit.kappa =TRUE)#--Plot the curve functionplot(vgm, main ="Variogram")
Code
plot(fit, main ="Variogram Model Fit", cutoff =max(vgm$dist))
Code
# When the curve plateaus, the point pairs are no longer spatially correlated
Create Grid
We need to create a surface onto which kriging model can make prediction. In other words, we need all possible points within the boundary of Barnet, which we can achieve by creating very small grid cells of equal size and extract their centroid points.
Code
#--Import Barnet shapefilebnt_shp <- sf::st_read(here("1_data", "9_geo", "bnt_lad.json"), crs =4326) |>st_make_valid() |>st_transform(27700) # reprojects onto British National Grid, of which the unit is meter#> Reading layer `OS - BoundaryLine - 2022Authorities - Barnet' from data source #> `C:\Users\Hannah.Chang\OneDrive - London Borough of Barnet\General - I&I Hub\02. Project Documentation\07. Standard Projects\ARC_LBB Website\arc_lbb_website\1_data\9_geo\bnt_lad.json' #> using driver `TopoJSON'#> Simple feature collection with 1 feature and 17 fields#> Geometry type: POLYGON#> Dimension: XY#> Bounding box: xmin: -0.3055738 ymin: 51.55528 xmax: -0.1291338 ymax: 51.67021#> Geodetic CRS: WGS 84#--Make grid of 100m x 100mgrid <- bnt_shp |>st_make_grid(cellsize = units::as_units(100, "m"), what ="centers") |>st_as_sf(crs =27700)#--Filter grid points to include only those within the Barnet polygonresult <-st_within(grid, bnt_shp) |>as.data.frame()grid_bnt <- grid |>mutate(row.id =1:nrow(grid)) |>left_join(result) |>filter(!is.na(col.id))#--Reproject to WGS84grid_bnt_wgs84 <- grid_bnt |>st_transform(4326) |>rename(geometry = x)#--Add longtidue and latitude grid_bnt_wgs84 <- grid_bnt_wgs84 |>mutate(x =st_coordinates(grid_bnt_wgs84)[, 1],y =st_coordinates(grid_bnt_wgs84)[, 2]) #--Checkggplot() +geom_sf(data = grid_bnt, alpha =0.3, colour ="#00AFA9") +geom_sf(data = bnt_shp, alpha =0, lwd =2, colour ="black") +ggtitle("All the points in the grids within Barnet boundary") +theme_minimal()
Perform Kriging
Code
#--Define components of kriging modelpredictors <-c("x", "y")formula_string <-paste("log(n)", "~", paste(predictors, collapse =" + "))krige_formula <-as.formula(formula_string)# Perform kriging over the gridkriged_result_grid <-krige(formula = krige_formula,locations = train_asb,newdata = grid_bnt_wgs84,model = fit,maxdist =10,nmax =50 ) #> [using universal kriging]# Retrieve interpolated values for the gridgrid_fin <- grid_bnt_wgs84 |>select(-contains("id")) |>mutate(krige_pred =exp(kriged_result_grid$var1.pred),krige_var =exp(kriged_result_grid$var1.var) ) # Visualise on static mapggplot() +geom_sf(data = grid_fin, aes(colour = krige_pred)) +scale_colour_gradient(low ="green", high ="red", name ="Predicted Value") +geom_sf(data = bnt_shp, alpha =0, lwd =1.5, colour ="black") +ggtitle("Hotspot Map for Anti-Social Behaviour Predicted by Kriging Model") +theme_minimal()
# Perform kriging over the test setkriged_result_test <-krige(formula = krige_formula,locations = train_asb,newdata = test_asb,model = fit,maxdist =10,nmax =50 ) |>st_as_sf()#> [using universal kriging]# Retrieve interpolated values for the test setvalidation <- test_asb |>st_as_sf() |>select(n) |>mutate(krige_pred =exp(kriged_result_test$var1.pred),krige_var =exp(kriged_result_test$var1.var) )# Calculate RMSE to evaluate model performancermse_asb <- Metrics::rmse(actual = validation$n,predicted = validation$krige_pred)print(paste0("Root mean square error of the kriging model is ", round(rmse_asb,2 )))#> [1] "Root mean square error of the kriging model is 11.94"min_asb <-min(validation$krige_pred)max_asb <-max(validation$krige_pred)nrmse <- rmse_asb / (max_asb - min_asb)print(paste0("Normalised root mean square error of the kriging model is ", round(nrmse, 2)))#> [1] "Normalised root mean square error of the kriging model is 0.44"